Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks
نویسندگان
چکیده
In this paper, we study the problem of aggregating noisy responses from crowd workers to infer the unknown true labels of binary tasks. Unlike most prior work which has examined this problem under the probabilistic worker paradigm, we consider a much broader class of adversarial workers with no specific assumptions on their labeling strategy. Our key contribution is the design of a computationally efficient reputation algorithm to identify and filter out such adversarial workers in crowdsourcing systems, given only the labels provided by the workers. Our algorithm uses the concept of optimal semi-matchings in conjunction with worker penalties based on label disagreements, to detect outlier worker labeling patterns. We prove that our algorithm can successfully identify low reliability workers, workers adopting deterministic strategies; and is robust to manipulation by worstcase sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Finally, we show that our reputation algorithm can significantly improve the accuracy of existing label aggregation algorithms in real-world crowdsourcing datasets.
منابع مشابه
Accurate Integration of Crowdsourced Labels Using Workers' Self-reported Confidence Scores
We have developed a method for using confidence scores to integrate labels provided by crowdsourcing workers. Although confidence scores can be useful information for estimating the quality of the provided labels, a way to effectively incorporate them into the integration process has not been established. Moreover, some workers are overconfident about the quality of their labels while others ar...
متن کاملCrowd-Selection Query Processing in Crowdsourcing Databases: A Task-Driven Approach
Crowd-selection is essential to crowdsourcing applications, since choosing the right workers with particular expertise to carry out specific crowdsourced tasks is extremely important. The central problem is simple but tricky: given a crowdsourced task, who is the right worker to ask? Currently, most existing work has mainly studied the problem of crowd-selection for simple crowdsourced tasks su...
متن کاملMulti-Objective Crowd Worker Selection in Crowdsourced Testing
Crowdsourced testing is an emerging trend in software testing, which relies on crowd workers to accomplish test tasks. Typically, a crowdsourced testing task aims to detect as many bugs as possible within a limited budget. For a specific test task, not all crowd workers are qualified to perform it, and different test tasks require crowd workers to have different experiences, domain knowledge, e...
متن کاملMake Hay While the Crowd Shines: Towards Efficient Crowdsourcing on the Web
Within the scope of this PhD proposal, we set out to investigate two pivotal aspects that influence the effectiveness of crowdsourcing: (i) microtask design, and (ii) workers behavior. Leveraging the dynamics of tasks that are crowdsourced on the one hand, and accounting for the behavior of workers on the other hand, can help in designing tasks efficiently. To help understand the intricacies of...
متن کاملAn Information Theoretic Approach to Managing Multiple Decision Makers
Citizen science and human computation involves working with multiple, untrusted decision makers, whose performance depends on training, rewards, ability and interest. We first present methods for screening workers and selecting informative objects to label. We then demonstrate Bayesian Classifier Combination as an effective method for classifying documents using unreliable crowdsourced labels. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 18 شماره
صفحات -
تاریخ انتشار 2017